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ABSTRACT 

We have analyzed the polyadenylation sites for the small subunit 
of ribulose bisphosphate carboxylase and chlorophyll a/b binding 
protein genes of Petunia (Mitchell) and the bronze gene of Zea 
mays . Sequence analysis of multiple cDNA clones revealed that 
polyadenylation of the transcripts occurred at either 2 or 3 
sites for all three groups of genes. In the examples where 3 
polyadenylation sites were detected, the middle site was the one 
predominantly used. Putative polyadenylation signals preceding 
the P^ly A tails diverged significantly from the animal consensus 
sequence AATAAA. in all the genes examined the first A residue 
in the poly A tail of the cDNA clones corresponded to an A 
residue in the homologous genomic sequence. 

INTRODUCTION 

In animals and viruses, RNA polymerase il terminates 
transcription far downstream of the DNA sequences coding for the 
3' mRNA termini (1-3). Processing and polyadenylation of the 
n.RHA then occurs predominantly at one site (1). A consensus 
sequence AATAAA is found, in the majority of animal genes 
sequenced; 10 to 33 nucleotides upstream from the site of 
polyadenylation (4). Mutational analysis has shown that the 
hexanucleotide AATAAA plays a major role in selecting the site at 
which processing and polyadenylation occurs (5,6). The consensus 
sequence AATAAA cannot be the only recognition signal determining 
the cleavage and polyadenylation site of a mature mRNA since it 
occurs unrecognized in the coding region and introns of several 
genes (7). There is now evidence suggesting that sequences 
immediately distal to the mature mRNA are also essential for 
viral and mammalian mRNA formation (8,9). The signal YGTGTTYY 
(Y=pyr iraidine) is commonly present in mammalian genes (9), 
approximately 30bp downstream from the AATAAA sequence, Even 
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when this sequence is not present an overrepresentation of the 
tri-nucleotide TGT (now termed the G/T cluster) is often found 
downstream of the AATAAA sequence. The G/T cluster may therefore 
have a function in the RNA processing events. 
Another sequence implicated in the positionning of the 3' 
cleavage site is CAYTG, usually found close to the site of polyA 
addition. This may base pair with a small nuclear RNA, U4 (10) 
and may implicate a function for U4 in the 3' cleavage process. 
Although cleavage and polyadenylation occur predominantly at one 
site in the animal and viral messages studied, there are some 
exceptions (reviewed in 11). Examples where multiple 
polyadenylation sites have been observed include the bovine 
prolactin mRNA (12), the mouse ribosomal protein L30 mRNA (13), 
the hepatitis B virus surface antigen mRNA (14) and the 
Drosophila tropomyosin mRNA (15). 

Processing and polyadenylation of plant mRNA 1 s has not been 
studied in detail and it has been presumed that the processes in 
plants would follow the general rules proposed for animal genes. 
The putative polyadenylation signals identified in plant genes 
often diverge considerably from the animal consensus AATAAA 
(16,17). Experiments using plant mRNA ' s have frequently shown 
multiple protected fragments in 3' Si protection experiments. 
These have generally been attributed to "breathing 19 artifacts 
that correspond to AT rich regions in the gene (18,19) leading to 
the conclusion that as in animal genes there is predominantly one 
polyadenylation site in plant genes. 

We have studied the polyadenylation sites of three distinct 
groups of plant genes by sequence analysis of multiple cONA 
clones. All three groups of genes, the small subunit (rbcS) 
genes of Petunia (Mitchell) ribulose bisphosphate carboxylase 
(RuBPCase) , the chlorophyll a/b binding protein (Cab) genes of 
Petunia (Mitchell) and the bronze gene of Zea mays show multiple 
polyadenylation sites suggesting that this may be a general 
phenomenon for plant genes. 

METHODS 

Plant material 

Th Petunia (Mitchell) strain is a doubled haploid produced by 
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anther culture from a hybrid between Petunia hybrida var Rose of 
Heaven and Petunia axillaris (20). The plants were grown under 
greenhouse conditions. PNA was isolated from the young leaves of 
plants approximately 10 weeks old. 

The Zea mays line used in this study was a B;£l version of the 
inbred W22 which expresses strong anthocyanin pigmentation in the 
husks and leaves. The plants were grown under greenhouse 
conditions. RNA was isolated from husk tissue of plants 
approximately 9 weeks old. 
cDNA clone isolation and characterization 

RNA was isolated as previously described (21). A cDNA library in 
gtl0 was constructed from 10pg of petunia leaf poly A RNA using 
the method of Huynh et al (22). The primary library of 12xl0 3 
phage (50% of which contained inserts) was amplified according to 
Huynh et al (22). The titre of the amplified library was 
Sxlfl^/ml. Plaques were transferred to nitrocellulose filters 
according to the method of Benton and Davis (23). The filters 
were hybridized with either rbcS or Cab cDNA clones or rbcS gene 
specific probes ?s described previously (21,24-27). The cDNA 
inserts were subcloned into M13 phage (28) and sequenced using 
the dideoxy sequencing method of Sanger et al (29). 
A cDNA library in pBR322 (30) was constructed from 20^g of maize 
husk poly A RNA using the methods described in Maniatis et al 
(31). The cDNA was ligated into the EcoRI site of pBR322 efter 
the addition of EcoRI linkers (N. E.Biolabs) . The library of 
32000 colonies (70% of which contained inserts) was transferred 
to nitrocellulose filters according to the method of Grunstein 
and Hogness (32) . 

The filters were hybridized with a genomic clone of the bronze 
gene (33). The bronze cDNA clones were subcloned into M13 phage 
(28) and sequenced using the dideoxy sequencing method of Sanger 
et al (29) . 

RESULTS 

We have characterized transcripts from 6 plant genes. These fall 

into 3 groups; in group 1 are 3 genes from the petunia rbcS 

multi-gene family, in group 2 are 2 genes from the petunia Cab 

multi-gene family and in group 3 is the single copy bronze gene 
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from maize. Our characterization of the 3 f regions of these 
genes has involved the sequencing of multiple, independent cDNA 
clones complementary to each of these 6 genes. In all, 37 cDNA 
clones have been sequenced. 

Analysis of the polyadenylation sites of three petunia rbcS genes 
The multi-gene family encoding the small subunit of RuBPCase in 
Petunia (Mitchell) consists of eight genes. The organization and 
expression of the individual genes has been described previously 
(21,24,25). We have analyzed the polyadenylation sites of two of 
these genes, S3U301 and SSU511 in detail and one other, SSU211 in 
less detail. The other rbcS genes have not been examined. The 
multiple cDNA clones used in the sequence analysis were isolated 
from a X gtl0 library constructed from petunia leaf RNA (21), 
Clones corresponding to the gene SSU301 were identified by their 
hybridization to a 61bp gene specific probe isolated from the 3' 
untranslated region of the SSU301 gene (21). Clones 
corresponding to the genes SSU511 and SSU211 were identified by 
hybridization to the cDNA clone pSSUSl (24,26) after high 
stringency washing and subsequent sequence analysis of the 3' 
untranslated tail region. Twelve cDNA clones corresponding to 
the gene SSU301, 13 corresponding to SSU511 and 2 corresponding 
to SSU211 were analyzed. 

A comparison of the level of expression of the different rbcS 
genes (21) has previously indicated that the frequency of cloning 
in the \ gtl0 library of cDNA clones for the different genes 
correlated with the level of mRNA for each gene as measured by 
Northern analysis. There was no evidence, therefore of any 
differential cloning of the cDNA clones corresponding to the 
different rbcS genes. 

Figure 1 shows the nucleotide sequence of the 3 9 untranslated 
tail regions of three petunia rbcS genes and the polyadenylation 
sites of the analyzed cDNA clones. The cDNA clones for both of 
the genes SSU301 and SSU511 fall into groups with different 
polyadenylation sites. The numbers in brackets on Fig. 1 
indicate the number of cDNA clones sequenced belonging to each 
polyadenylation group. The two cDNA clones analyzed 
corresponding to SSU211 also had different poly A sites. Since 
the poly A tails f all the cDNA clones analyzed were at least 50 
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Figure _. The nucleotide sequence of the 3 5 untranslated tail 
regions of three different petunia rbcS genes and the 
polyadenylat ion sites of multiple cDNA clones. 
The position of the given nucleotide sequence relative to the 
translation termination codon is indicated at the beginning of 
each sequence. The genomic clones, SSU301 and SSU211 have been 
described previously (25). We have not isolated the genomic 
clone for SSU511, so the nucleotide sequence presented has been 
derived from a combination of the sequence of the cDNA clones. 
The cDNA clones were isolated from aAgtl0 library of petunia RNA 
(21). cDNA clones corresponding to the gene SSU301 were 
identified by their hybridization to a gene specific probe (21). 
cDNA clones corresponding to the genes SSU511 and SSU211 were 
identified by hybr idizat tr.i to the cDNA clone pSSUSl (24,26) 
after high stringency washing and subsequent sequence analysis. 
Putative polyadenylat ion signals have been indicated for each 
group of cDNA clones. The numbers in brackets at the end of each 
group of cDNA clones indicate the number of cDNA clones sequenced 
belonging to each polyadenylation group. The sequences which may 
represent the G/T cluster sequences have been underlined. 



nucleotides long and do not occur in a region corresponding to an 
AT rich area in the genomic sequence we are confident that 
correct priming occurred with the oligo dT during the cDNA 
preparation, and that the different groups of cDNA clones 
represent true differences in the poly A addition site on the 
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mRNA * s . In the tw examples where many cDNA clones have been 
analyzed for each gene, SSU301 and SSU511, the middle example of 
the three polyadenylation sites is the one predominantly used. 
Eight out of 12 of the cDNA clones for SSU301 and 7 out of 13 
clones for SSU511 had the poly A tail added in the middle 
position. The sites where polyadenylation occurs in the mRNA are 
spread over quite large distances. The three polyadenylation 
sites detected for SSU301 are 69 nucleotides apart, those for 
SSU511 are separated by 39 nucleotides; however the two detected 
for SSU211 are 95 nucleotides apart. A high degree of 
variability in the positionning of the poly A tails therefore 
occurs between these highly related genes. 

A putative polyadenylation signal for each group of cDNA clones 
is illustrated in Fig.l. These were chosen as being the 
sequences located between 15 and 29 nucleotides upstream of the 
poly A tail (the consensus distance in animal genes is 10 to 33 
nucleotides (4)') which most closely resemble the animal consensus 
sequence AATAAA (4) and which start with an A residue. One 
putative signal which is present in all three rbcS genes is 
AATAAT. This putative signal most closely resembles the animal 
consensus sequence and it is positionned upstream of the poly A 
site used predominantly in both SSU301 and SSU511. All the other 
putative polyadenylation signals identified diverge significantly 
from AATAAA. 

One feature which is common to all the rbcS genes (including 
other genes where only one cDNA clone has been analyzed, data not 
shown) is that the nucleotide in the genomic sequence which 
corresponds to the first A in the poly A tails of the cDNA 
clones, is an A residue. 

Also illustrated in Fig.l are sequences which may represent the 
G/T cluster sequences which have been identified in animal genes 
(9), These are positionned immediately downstream of the poly A 
addition site. They are not however, present at all of the poly 
A addition sites, nor after all the poly A sites which are 
predominantly used. Their importance in plant gene mRNA 
formation remains questionnable. We have not found sequences 
corresponding to CAYTG, the sequence found in animal genes which 
can base pair with the small nuclear RNA, U4. 
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Figure 2. The nucleotide sequence of the 3* untranslated tail 
regions of two different petunia Cab genes and the 
polyadenylation sites of multiple cDNA clones. 
The position of the given nucleotide sequence relative to the 
translation termination codon is indicated at the beginning of 
each sequence. The genomic clone Cab9lR has been described 
previously (27,34). The genomic clone for Cab29 has not been 
isolated and the nucleotide sequence presented has been derived 
from a combination of the sequence of the cDNA clones. The cDNA 
clones were isolated by hybridization to a Cab cDNA clone, Cab 3 
(27) and subsequent sequence analysis. Putative polyadenylation 
signals have been indicated tox each group of cDNA clones. The 
numbers in brackets at the end of each group of cDNA clones 
indicate the number of cDNA clones sequenced belonging to each 
polyadenylation group. The sequences which may represent the G/T 
cluster have been underlined. 



Analysis of the polyadenylation sites of two Cab genes. 
The chlorophyll a/b binding proteins of Petunia (Mitchell) are 
also encoded by a multi-gene family consisting of at least 16 
genes (27,34). We have examined the polyadenylat ion sites of two 
Cab genes by sequencing two independent cDNA clones for each 
gene. The cDNA clones were isolated from the \ gtl0 library 
constructed from petunia leaf RNA by hybridization to pCab 3 
probe (27), which at normal stringency hybridizes to all of the 
Cab genes. The cDNA clones described here were identified as 
corresponding to different Cab genes by subsequent sequence 
analysis. Figure 2 shows the nucleotide sequence of two cDNA 
clones for each of two different Cab genes. Cab 91R and Cab 29. 
Both cDNA clones for each gene have different polyadenylation 
sites, 28 nucleotides apart in Cab 91R and 39 nucleotides apart 
in Cab 29. Using the same criteria as for the rbcS genes, a 
putative polyadenylation signal has been pr posed for each cDNA 
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Figure 3. The nucleotide sequence of the 3' untranslated tail 
region of the maize bronze gene and the polyadenylation sites of 
multiple cDNA clones. 

The position of the given nucleotide sequence relative to the 
translation termination codon is indicated at the beginning of 
the sequence. The genomic clone for the Bz allele present in the 
B;P1 line from which the cDNA clones were obtained has not been 
Isolated. The bronze sequence presented has been derived from a 
combination of the sequence of the 6 cDNA clones and is 96% 
homologous to the sequence of the 3* untranslated region of the 
Bz-McC allele already cloned (33 and unpublished results). The 
cDNA clones from a pBR322 library constructed from husk tissue 
RNA were identified by hybridization to the genomic clone. 
Putative polyadenylation signals have been indicated for each 
group of cDNA clones. The numbers in brackets at the end of each 
group of cDNA clones indicate the number of cDNA clones sequenced 
belonging to each polyadenylation group. 



clone and these are indicated in Fig. 2. Only one of the 4 
putative signals in the Cab genes is AATAAT , the sequence which 
is present in all the rbcS genes. All other putative signals 
also diverge significantly from the animal sequence AATAAA (4). 
Where it can be determined the beginning of the poly A tail in 
the cDNA clones corresponds to an A residue in the genomic 
sequence. This is one feature common to both Cab and rbcS genes 
in petunia. 

Analysis of the polyadenylation sites of the maize bronze gene. 
The gene encoding UDP-glucose: f lavonoid 3-0-glucosyltransf erase 
(UFGT) in Zea mays is present in a single copy (33,35). Six cDNA 
clones for this gene were isolated from a cDNA library 
constructed in pBR322 from maize husk tissue RNA. The clones were 
identifed by hybridization to pAGS551, a 0.84kb genomic fragment 
corresponding to the 3' end of the transcribed region of the 
bronze gene (33). The 6 cDNA clones could be classified into 
three groups on th basis of their polyadenylati n sites. Figure 
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3 shows a comparison of the nucleotide sequence of the three 
groups of cDNA clones. The numbers in brackets on Fig. 3 indicate 
the number of cDNA clones sequenced in each polyadenylat ion 
group. Although the sample size of cDNA clones is small it 
appears that as for the rbcS genes the middle polyadenylation 
site is the one predominantly used. This was confirmed by 3 1 
protection experiments (data not shown). The three 
polyadenylation sites are separated by 130 nucleotides. 
Putative polyadenylation signals have been indicated on Fig. 3. 
These were chosen using the same criteria as were described for 
the rbcS and Cab genes. The three polyadenylation signals 
chosen, ATACAT , AAGCAT and AAAATA diverge more from the animal 
consensus sequence, AATAAA than the polyadenylation signals 
identified for either the rbcS or Cab genes. 

All the cDNA clones corresponding to the bronze gene have an A 
residue in the derived genomic sequence in the position that 
corresponds to the first A residue in the poly A till. This is 
the one feature which has been common to all three groups of 
plant genes analyzed here. 

DISCUSSION 

In this study we have compared the polyadenylation sites for 
three groups of plant genes, two multi-gene families from Petunia 
(Mitchell), the rbcS and Cab genes and a single copy gene from 
Zea mays , the bronze gene. All three groups of genes show 
multiple sites where polyadenylation of the transcript had 
occurred jin vivo. In the two groups of genes where many cDNA 
clones have been analyzed, the rbcS and bronze genes, three 
polyadenylation sites were detected, and in both cases the middle 
site is the one which is predominantly used. Nevertheless the 
outer polyadenylation sites are used «t a frequency which is much 
higher than those of the alternative polyadenylation sites which 
have been found in some animal genes. 

The data collected in this study indicate that there are not 
simple rules concerning the positionning of the poly A tail in 
plant genes. The multiple polyadenylation sites can be as close 
as 18 nucleotides or as distant as 130 nucleotides. They are 
preceded by putative polyadenylation signals which are not 
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Table 1. Putative polyadenylation signals identified in the rbcS, 
Cab and bronze cDNA clones. The putative polyadenylation 
consensus sequence was formulated from the 15 putative 
polyadenylation signals identified in the cDNA clones examined in 
this study. 



Putative poly A 


gene 

m m ■ i 


s ignal 




AACCAA 


rbcS 


AATAAT 


rbcS, Cab 


ATATAA 


rbcS 


AATCAA 


rbcS 


ATACTA 


rbcS 


ATAAAA 


rbcS 


ATGAAA 


Cab 


AAGCAT 


Cab, bronze 


ATTAAT 


Cab 


ATACAT 


bronze 


AAAATA 


bronze 


Putative plant poly A 


A 15 A 9 T 6 A 8 A 13 T 8 


consensus sequence: 


T 6 A 5 C 6 T 2 A ? 




G 3 T X 




C l 



conserved between genes we have analysed, not even within a 
multi-gene family (summarized in Table 1). The most conserved 
plant polyadenylation signal is AATAAT, which is found in all the 
rbcS genes and one of the Cab genes. This is also the 
polyadenylation signal most closely resembling the animal 
consensus sequence, the significance of which remains unclear. A 
putative plant polyadenylation consensus sequence can be 
formulated if we compare all the putative polyadenylation signals 
identified in the cDNA clones examined in this study. This is 
outlined in Table 1. Only A or T residues are found at positions 
1,2,5 and 6 in the consensus sequence. 

The one general rule we can propose based on the three groups of 
genes analysed here is that the first A residue in the poly A 
tail of the cDNA clones c rresp nds to an A residue in the 
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homologous genomic sequence. This rule, however does not appear 
to hold up when other publish d plant gene sequences are included 
in the analysis (17). Perhaps the one general conclusion we 
could make is that processing and polyadeny lat ion events for 
plant genes are capable of a high degree of flexibility. 
It will be interesting to see if variability in the processing 
and polyadenylation of the plant mRNA ' s affects the level of mRNA 
stability. A preliminary comparison of the expression of the 
individual rbcS genes (21) with the information on their 
polyadenylation sites suggests there is no direct correlation. 
The two rbcS genes, SSU301 and SSU211, are expressed to very 
different levels in petunia leaf tissue, accounting for 47.3% and 
1.9% respectively of the total rbcS RNA, however they both have 
multiple polyadenylation sites, one of which is preceded b" the 
consensus AATAAT. Variability in the 3' processing events would 
not therefore appear to explain the differences in the steady 
state mRNA ' s of these two genes. 
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